-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cudf.pandas
proxy array picklable
#17929
Make cudf.pandas
proxy array picklable
#17929
Conversation
I'll continue once #17936 is merged. That will help ensure that the tests are passing with the changes in this PR. |
cudf.pandas
proxy types explictly call our custom pickling logic
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem also makes me wonder what other methods our proxy numpy array is getting from np.ndarray
instead of our _FinalProxy
because the np.ndarray
is first in the MRO, but that's another problem for another day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this isn't quite ready to merge yet. Looking into the failures now.
Edit: I'll request another review from you @mroeschke when its ready
cudf.pandas
proxy types explictly call our custom pickling logiccudf.pandas
proxy types picklable
cudf.pandas
proxy types picklablecudf.pandas
proxy array picklable
Thanks, we probably okay now. The methods that
They we're all accounted for except the two used for pickling. |
/merge |
def test_pickle_round_trip_proxy_numpy_array(array): | ||
arr, proxy_arr = array | ||
pickled_arr = BytesIO() | ||
pickled_proxy_arr = BytesIO() | ||
pickle.dump(arr, pickled_arr) | ||
pickle.dump(proxy_arr, pickled_proxy_arr) | ||
|
||
pickled_arr.seek(0) | ||
pickled_proxy_arr.seek(0) | ||
|
||
np.testing.assert_equal( | ||
pickle.load(pickled_proxy_arr), pickle.load(pickled_arr) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: This can be simplified a bit, I think. I'll tack it on to another PR.
def test_pickle_round_trip_proxy_numpy_array(array):
arr, proxy_arr = array
np.testing.assert_equal(
pickle.loads(pickle.dumps(proxy_arr)),
pickle.loads(pickle.dumps(arr))
)
Description
Apart of #17490.
We employ custom pickling logic for our cudf.pandas wrapped types. The logic lets us serialize and de-serialize wrapped types by serializing and de-serializing the underlying wrapped types (ie. the type of
_fsproxy_wrapped
). This pickling logic is defined in_FinalProxy
, which is the base class of all of our "final" proxy types.The failures in the integration tests occurred because this pickling logic wasn't used for the proxy numpy array type. This is because the "final" proxy array type inherits from an additional base class:
ProxyNDarrayBase
(which contains logic to inherit fromnp.ndarray
). And it comes before_FinalProxy
in the classes MRO, so the custom pickling is not used.Additionally, the custom pickling logic used for other proxy types is incompatible with our proxy array. So this PR defines a custom function for handling proxy array serialization.
Checklist